Real-time agricultural object detection and display

ABSTRACT

A computer-implemented method includes performing, using a processor onboard a vehicle, a machine learning (ML) processing on sensor input from sensors onboard an agricultural vehicle, identifying, according to a rule, a subset of data resulting from the ML processing and generating and displaying, in real-time, the subset of data to a user interface, thereby enabling a user interaction with the subset of data.

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent documents claims the benefit of priority of U.S. ProvisionalPatent Application 63/306,852, entitled “TRAINING AN AGRICULTURAL IMAGEPROCESSING SYSTEM BASED ON USER FEEDBACK,” filed on Feb. 4, 2022. Theentirety of the aforementioned patent application is incorporated hereinby reference.

TECHNICAL FIELD

The present patent document relates to machine learning and roboticimplementation of agricultural activities.

BACKGROUND

Global human population growth is expanding at a rate projected to reach10 billion or more persons within the next 40 years, which, in turn,will concomitantly increase demands on producers of food. To supportsuch population growth, food production, for example on farms andorchards, need to generate collectively an amount of food that isequivalent to an amount that the entire human race, from the beginningof time, has consumed up to that point in time. Many obstacles andimpediments, however, likely need to be overcome or resolved to feedfuture generations in a sustainable manner.

To support such an increase in demand, agricultural technology has beenimplemented to more effectively and efficiently grow crops, raiselivestock, and cultivate land. Such technology in the past has helped tomore effectively and efficiently use labor, use tools and machinery, andreduce the amount of chemicals used on plants and cultivated land.

However, many techniques used currently for producing and harvestingcrops are only incremental steps from a previous technique. The amountof land, chemicals, time, labor, and other costs to the industry stillpose a challenge. A new and improved system and method of performingagricultural services is needed.

SUMMARY

Techniques for detection of and controlling growth of undesirablevegetation in a field are described.

In one example aspect, a computer-implemented method of sensor inputprocessing includes performing, using a processor onboard a vehicle, amachine learning (ML) processing on sensor inputs from sensors onboardthe vehicle, identifying, according to a rule, a subset of dataresulting from the ML processing; and generating the subset of data formodifying the ML processing for a subsequent use. The subsequent use maybe either in a same agricultural environment that the vehicle isoperating in or in a different agricultural environment.

In another example aspect, a method includes performing, using aprocessor onboard a vehicle, a machine learning (ML) processing onsensor input from sensors onboard the vehicle, identifying, according toa rule, a subset of data resulting from the ML processing, andgenerating and displaying, in real-time, the subset of data to a userinterface, thereby enabling enable a user interaction with the subset ofdata.

In another example aspect, A processor-implemented method includesperforming, using a processor onboard a vehicle, a Machine Learning (ML)processing on sensor input from sensors onboard the vehicle,identifying, according to a rule, a subset of data resulting from the MLprocessing; and providing the subset of data to a user interface.

In another example aspect, a computer-implemented method of processingagricultural images is disclosed. The method includes capturing, usingone or more cameras deployed on an agricultural platform images of anagricultural environment in which the agricultural platform isoperating; annotating the images using a first machine learning (ML)algorithm, wherein each annotation includes a confidence numberassociated with the each annotation, and wherein the first ML algorithmis trained based on a second set of images captured by the one or morecameras; generating a subset of the images as a training set for furthertraining of the first ML algorithm, wherein the subset of the images inthe training set is generated partially based on a user feedback; andtransmitting the subset of images to a training platform that generatesa second ML model by training of the first ML algorithm using the subsetof image data.

In one example aspect, another computer-implemented method of processingagricultural images is disclosed. The method includes annotatingagricultural images using N machine learning (ML) models, where N is aninteger greater than 1; presenting results of annotation by the N MLmodels on a user interface; receiving a user feedback for the results ofannotation by the N ML models; and generating a set of training imagesbased on the user feedback

In one example aspect, another computer-implemented method of processingagricultural images is disclosed. The method includes capturingreal-world images of an agricultural environment, wherein eachreal-world image comprises a plurality of pixels, wherein acorresponding depth value and corresponding one or more color values areassociated with each pixel of the plurality of pixels; and detecting oneor more agricultural objects of interest in the real-world images byapplying multiple image processing schemes to the real-world images,wherein the multiple image processing schemes include a depthvalue-based image processing scheme and a color value-based imageprocessing scheme, w % herein at least one of the multiple imageprocessing schemes use a machine learning (ML) model.

In another example aspect, a computer-implemented method of sensor inputprocessing, implemented by an agricultural platform comprising aprocessor and a sensor is disclosed. The method includes receivingsensor input from the sensor; processing the sensor input by multiplemachine learning (ML) algorithms, each using a corresponding ML modelfor generating labels for objects identified in the sensor input;combining labels generated by each ML algorithm to generate asuper-imposed labeled sensor input frame; comparing outputs of the MLalgorithms to determine similarities or differences; and using resultsof the comparing for improving an operational characteristic of thesensor input processing.

In another aspect, another method of processing agricultural images isdisclosed. The method includes comparing object detections performed bymultiple image processing schemes to determine a set of ground truthimages from which at least one machine learning (ML) models used by atleast one ML algorithm included in the multiple image processing schemesis trained, wherein the multiple image processing schemes include two ormore of: (a) an image processing scheme that includes a cascade ofmultiple ML algorithms; (b) an image processing scheme that includesimage annotation based on user feedback; (c) an image processing schemethat includes a cascade of an ML algorithm or a computer vision (CV)algorithm and a user feedback

In another example aspect, an apparatus is disclosed. The apparatus maybe used as an agricultural vehicle and comprises a processor.

In another example aspect, a computer-readable medium is disclosed. Thecomputer-readable medium stores processor-executable code that, uponexecution, causes a processor to implement a method disclosed in thepresent document.

These, and other, aspects are described throughout the present document.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure will become better understood from the detaileddescription and the drawings.

FIG. 1 shows an example of an agricultural automation system.

FIG. 2 is a flowchart of an example method of processing agriculturalimages.

FIG. 3 shows an example system for real-time training of an imageprocessing system.

FIG. 4A shows an example configuration of an agricultural imageprocessing system.

FIG. 4B shows another example configuration of an agricultural imageprocessing system

FIG. 4C shows another example configuration of an agricultural imageprocessing system

FIG. 4D is a flowchart of an example method of processing agriculturalimages.

FIG. 5 is a flowchart of an example method or real-time training ofmachine learning algorithms for agricultural image processing.

FIG. 6 shows an example of a multi-model ML system for processing ofagricultural images.

FIG. 7 is a flowchart of an example method of multi-model training.

FIG. 8A shows an example of a multi-layer image processing system.

FIG. 8B shows an example processing scheme of agricultural sensor inputat different levels of detail.

FIG. 9 is a flowchart of an example method of processing agriculturalimages.

FIG. 10 shows an example of training ML models for agricultural imageprocessing.

FIG. 11 is a flowchart of an example method of processing agriculturalimages.

FIG. 12 is a flowchart of an example method of processing agriculturalimages.

FIG. 13 is a flowchart of an example method of processing agriculturalimages.

FIGS. 14-18 are flowchart examples of methods described in the presentdocument.

DETAILED DESCRIPTION

In this specification, reference is made in detail to specificembodiments of the disclosure. Some of the embodiments or their aspectsare illustrated in the drawings.

For clarity in explanation, the present document has been described withreference to specific embodiments, however it should be understood thatthe disclosure is not limited to the described embodiments. On thecontrary, the disclosure covers alternatives, modifications, andequivalents as may be included within its scope as defined by any patentclaims. The following embodiments of the disclosure are set forthwithout any loss of generality to, and without imposing limitations on,the claimed disclosure. In the following description, specific detailsare set forth in order to provide a thorough understanding of thepresent disclosure. The present disclosure may be practiced without someor all of these specific details. In addition, well known features maynot have been described in detail to avoid unnecessarily obscuring thedisclosure.

In addition, it should be understood that steps of the exemplary methodsset forth in this exemplary patent can be performed in different ordersthan the order presented in this specification. Furthermore, some stepsof the exemplary methods may be performed in parallel rather than beingperformed sequentially. Also, the steps of the exemplary methods may beperformed in a network environment in which some steps are performed bydifferent computers in the networked environment.

Some embodiments are implemented by a computer system. A computer systemmay include a processor, a memory, and a non-transitorycomputer-readable medium. The memory and non-transitory medium may storeinstructions for performing methods and steps described herein. Variousexamples and embodiments described below relate generally to robotics,autonomous driving systems, and autonomous agncultural applicationsystems, such as an autonomous agricultural observation and treatmentsystem, utilizing computer software and systems, computer vision andautomation to autonomously identify an agricultural object including anyand all unique growth stages of agricultural objects identified,including crops or other plants or portions of a plant, characteristicsand objects of a scene or geographic boundary, environmentcharacteristics, or a combination thereof.

INTRODUCTION

Automation of agricultural tasks holds the promise of variousoperational and efficiency improvements to the agricultural and farmingindustry. For example, various agricultural tasks may be performed bymachines that are controlled by a computer system. One technology thatenables automation of agricultural tasks is the ability of a computer toanalyze sensor input such as visual images of a farm and identifyvarious objects in order to decide which tasks to perform and whichobjects should be subject to these tasks. The present document providestechniques that may be used in such computer systems to alleviatecomputational complexity and provide accuracy execution of objectidentification and subsequent agricultural tasks.

U.S. patent application Ser. No. 17/506,588, filed on Oct. 20, 2021,entitled “Autonomous Detection and Control of Vegetation,” incorporatedherein by reference in entirety, describes various embodiments of anagricultural platform that includes cameras that capture images ofagricultural environment, a machine learning (ML) model-basedcomputational framework that analyzes the captured images, and controlsa treatment mechanism that treats the agricultural objects such asspraying a pesticide or targeting unwanted vegetation with laser beams.For example, in some embodiments, an ML algorithm may be used to processimages to identify various plants, fruits, flowers, weeds, and otherobjects typically found in an agricultural environment using one or moreML models. During a run of the agricultural platform in the real-worldagricultural setting, the onboard cameras may capture images that may beused for further improvements to the ML models to be able to achieve adesired task (e.g., identify weeds being different from crop, oridentify fruits, etc.).

During the operation of such an agricultural platform, the number ofimages captured and the corresponding pixel data generated could runinto terabytes on a daily basis. Such an amount of data is impracticalfor use to train the ML model. For example, training of the ML model mayneed to be performed using a select few images that would be consideredground truth to train an ML model. The present document providestechniques that may be used for (1) reducing the number of images usedfor training and/or (2) techniques that allow for selection of imagessuitable for ML model training in generation of the reduced set ofimages.

In some cases, it may be desirable to not just identify variousagricultural objects, but also be able to obtain an understanding ofobjects in various stages of development. For example, an ML model maybe trained to discern between a fruit (e.g., an apple) and leaves.However, it may also be desirable to identify a growth stage of theapple using ML model (e.g., is the apple ripe*?). In some cases,training an ML algorithm to be able to annotate image data may thereforeneed training by user feedback in which a user provide supervision tothe training performed by an ML model to allow the ML model to improveits accuracy and also granularity of labeling various objectcharacteristics.

In some embodiments, multiple ML models may be used to process a sameset of images. These ML models may be different trained versions of asame underlying model or may be models that are created using adifferent set of hyperparameters. Results from the multiple ML modelsmay be compared for similarities and differences. Such similarities anddifferences may be highlighted to a human user. The human user mayprovide a feedback regarding accuracy of identification orclassification, which may be used to further train the ML models. Insome cases. ML models may entirely fail to identify an object with ahigh degree of accuracy (e.g., probability of accuracy above athreshold). In such cases, the human user may be prompted in real-timein the field to check the location of the undetected object and providefeedback to the ML model to allow improvements in the ML model. Thus, anactive learning procedure may be implemented in which an ML algorithmmay request a user feedback for a portion of an image to better trainthe ML model in real-time during an ongoing operation in an agriculturalenvironment.

An Example Embodiment

FIG. 1 shows an example block diagram of an agricultural automationsystem. An agricultural computer platform (1102) may be deployed in anagricultural environment and may comprise a number of agriculturalvehicles that operate in the agricultural setting and collect imagesduring ongoing operation of the agricultural vehicle (e.g., a tractorequipped with a computer platform) in the field. A manifest may begenerated for these images during the extraction 1104. Images may bestored into a database 1108 that may be used to store the capturedimages from multiple cameras and sensors fitted on the agriculturalvehicles 1106 that have undergone processing such as annotations(1106A), object detection information (1106B), or a manifest generation(1106C). A transfer function 1110 may be used to identify which imagesin the database 1108 (or a corresponding manifest) are new and suitablefor transfer to an offline computing platform 1120 for additionalprocessing. The transfer function 1110 may be operated to monitor imagesfor changes in manifest (1112A) and sending images from the manifest(1112B). For example, the manifest may make it unnecessary to performactual duplication of images on various databases where images may besimply shared based on manifest information such as a sequence number, ageographic location where captured, a time base information forsynchronization, etc.

The offline computing platform 1120 may be located in a computingfacility or in the cloud 1140. The offline computing platform 1120 mayoperation to receive the images collected and processed in the fieldthrough a mailbox module (1122), retrieved images (1124), store theretrieved images to a database (1126). The retrieved images andretrieved annotations (1134) may be used to write to a database 1132(e.g., a Mongo database). The images from the database 1132 may be usedto perform further training tasks 1130 by sending to a labeling function1128. The labeling function 1128 may monitor the images or the manifestin the database 1126 to identify which images should be sent to anannotation service 1136 to performing labeling on the images. Theannotation service 1136 may work on images (or manifest) in the cloud1140 based on the information received from the labeling function 1128.The annotated data may be, for example, data in which boxes have beendrawn around an object, or according to different classes. These classesmay be based on a type of agricultural object, age of the object,ripeness state of the object, and so on. The labelled images used fortraining may include location data such as a global positioning systemdata where the images are captured. In some embodiments, images may bestored in a single database and various functional blocks in the systemdepicted in FIG. 1 may communicate image data with each other by using amanifest file, an index file, a pointer, and such, to indicate images inthe database rather than transmitting actual image data to each other.

FIG. 2 shows another example method 1200 of processing agriculturalimages according to various embodiments described herein. The method1200 starts at 1220. In various embodiments, the method may beimplemented by an agricultural vehicle such as a tractor, a drone, amanually operated vehicle or a self-driving vehicle and in particularone or more modules disposed on the vehicle for surveying a farm or forproviding a treatment. The vehicle may be called “biff” as anabbreviation. At 1230, images or sensor readings are received at thevehicle. At 1240, received images are analyzed either by a human or by amachine (e.g., using a machine learning ML or a computer visionalgorithm) to label certain regions of the image as includingagricultural objects of interest. The result of the analysis may be sentfor training the ML algorithm (e.g., based on human identified groundtruth) or may be displayed to a human operator (e.g., in case that ML/CVis not able to establish identity of an object). At 1250, feedbackreceived from the human or from additional processing is used forapplying correction and/or making changes to labels applied to theidentified objects. At 1260, object classification may be performedbased on a type of agricultural object such a veggie segmentation, apartial segmentation, a height segmentation, background segmentation,etc. and further labeling may be performed. Here, veggie segmentationmay mean identifying all veggies into a segment or grouping veggiesaccording to veggie identity into segments (e.g., all cabbage, allcauliflower, all carrots, etc.). Here, a partial segmentation may beperformed based on a specific dimension—e.g., segmenting the originalframe into square or rectangular tiles. Here, a heigh segmentation maycomprise segmenting the images or sensor readings into “strips” orslices such as a strip of image that contains a vegetation row. Here,background segmentation may mean identifying image portions that areneither soil nor weed nor plants, e.g., simply “nothing” and removingsuch image portions from further processing. At 1270, the labeled imageportions may be associated via a manifest and may be uploaded to adatabase. Further details of the method 1200 are disclosed throughoutthe present document.

Image Labeling

In an agricultural application, classification of agricultural objectsbased on various object attributes relevant to the agricultural settingmay be performed. In some embodiments, labeling may be used to classifyobjects according to various attributes. As disclosed throughout thepresent document, the attributes may include pixel color, pixel depth,object position, a relationship between the object in one image andwhether or not the object is in an expected location in an image basedon one or more previous images and one or more next images (e.g., solidobjects usually only either undergo translational motion or rotationalmotion from one image to a next image frame). In some cases, thelabeling may be unsupervised. Alternatively, in some cases, labeling maybe supervised where user feedback may be received for any correctionsthat may be used for further training of the labeling implementation.

In some implementations, labeling may be achieved by drawing boundingboxes around pixels. In some implementations, labeling may comprisegenerating a manifest that associates an attribute with every pixel orregion of an image. In some cases, labeling may associate aclassification for a labeled object, wherein the classification may bebased on attributes.

In some cases, labeling may be performed by layering. Here, layering mayrefer to all pixels in an image as a main layer, and pixel that share asame attribute as a layer. For example, layering may include multiplelayers—each representing a color or a depth range of pixels. As anotherexample, layering may include pixels inside bounding boxes in one layerand pixels outside bounding boxes in another layer, and so on. In someembodiments, a human user may first perform image labeling (e.g.,bounding boxes), which may be followed by a machine based labeling.Alternatively, ML based labeling may first be performed, followed by ahuman user feedback. In some embodiments, a semantic segmentation ofimages may be performed first. In some embodiments, image regions may befirst identified for further processing, and the further processing maybe performed only on the identified image regions (e.g., semanticsegmentation).

In some embodiments, image labeling may use information associated withimages including a location (e.g., GPS location) associated with theimage, a time stamp when the image was acquired (e.g., time of the day,relative position of sunlight on the image, etc.). This information maybe used as a parameter for training an ML model that performs thelabeling.

Real-Time Training

The variety of objects that an ML model may encounter in processingnatural images, e.g., agricultural images, is vast. Not only is thenumber of plant species practically unlimited, but also other naturalobjects like rocks, mud piles, sand formations etc. may occur in anumber of different colors, shapes and sizes. To add further to thiscomplexity, several agricultural objects, such as fruit and produce,look different during various stages of growth and during various timesof the day. For example, some flowers open and close in response tosunlight and fruit and vegetables change size and color as they becomeriper.

One operational challenge presented by such an ever-changing environmentof an agricultural use case is the time it takes to train an ML model toperform accurate tasks. For example, a typical agricultural workflow maygenerate hundreds of millions of images, amounting to terabytes of data,during a typical working day. Because large scale computing resourcesare typically not available onsite, this data may have to be uploaded toa computing facility where offline processing may be performed (e.g.,1120). The offline computing may include labeling of images, or a subsetthereof, identifying images suitable for training, further training ofan ML model, and then subsequently downloading the ML model to anin-field ML platform. However, if this training loop takes a few days,it is likely that the flowers, weeds, crop and fruit whose images wereused for training the ML model have already grown or become ripe andchanged their visual look. In such a case, the changed objects maypresent a detection problem to the newly trained ML mode.

Therefore, agricultural environment presents a particularly challengingproblem to ML based object detection techniques. Due to possibility oferrors in identifying agricultural objects, automated agriculturalequipment may create operational issues such as improper treatment ofdesired vegetation or entirely missing certain tasks because objectswere not identified or were misidentified. To solve such issues, in someembodiments, ML models may undergo real-time training, among othersolutions, as disclosed in the present document. One of the challengesin such a real-time training goal is the amount of data that is producedduring the operation of agricultural equipment. For example, 4K images,at 30 frames a second, captured by 8 cameras generates about 6 Gigabytesof uncompressed image data per second. The present document disclosestechniques that may be used to selectively reduce the amount of dataused for real-time training by a considerable amount, e.g., few images(e.g., 100 images a day) that the ML detection algorithm deems to beparticularly difficult to classify.

FIG. 3 shows an example operational setup 1300 in which real-timedetection, confirmation, visualization, and further processing,including training, may be implemented. A platform may be installed on avehicle 1304 and may include one or mor cameras that capture images(e.g., image 1306) of a field environment. The image 1306 may include arow of a crop with possible weeds interspersed in between the crop in areal world agricultural environment 1316. A region of this image 1306 isshown in greater detail and having multiple vegetation objects that maybe a desired vegetation 1314 (e.g., carrots or another crop) andundesired vegetation such as different types of weeds 1310, 1312. Theonsite computer platform 1302, disposed on a treatment system 1303(which may include modular components) may include a compute unit 1308which may perform image classification/segmentation using various imageprocessing algorithms 1318 which may include one or more ML models.

As disclosed in the present document, various strategies may be used toseparate out the weeds 1310, 1312 from the desired vegetation 1314,including using of color information, depth information,background-foreground separation, computer vision. ML models, and so on.The detected objects may further be annotated using an action performed.For example, solid boundary around weed 1310 indicates that a treatmentwas applied to this object, while a dashed boundary around weed 1312 maydetect that no treatment was performed to this object, while lightdashed boundary around objects 1314 indicates that this was classifiedto be desired vegetation.

An object such as the weed 1312, which was detected as an object, butnot with such a confidence to take a specific action on the object, maybe alerted for user input or labeling in real-time. For example, animage, or a portion of the image that includes the object, may be sentto a user interface of a field-deployed unit, for example computingdevice 1320, and/or further be sent to the offline computing platform1120 for labeling, user feedback and ML training in real-time. In oneexample, an ML model or algorithm (as described in this disclosure) canbe configured to perform object detection with multiple attributes. Inone example an object detection can include one or more of a 1.detection that a portion of an image includes an object and a confidencelevel associated to the detection, 2. the pixels associated with theobject detection, 3. the identification or class of the object (therebeing a plurality of classes including species of plant, landmark, stateof object, phenological stage of plant, etc.), 4. the confidence levelassociated with the classification, or a combination thereof.

FIG. 4A shows an example configuration 1401 in which a particular imageregion 1312 is displayed on the computing device 1120. In addition todisplaying the image, the user interface may include additionalinformation such as a label 1411 that may indicate a type of objectdetected by the processing. Examples of such a label may include “weed”“vegetation” “soil” or “nothing.” Alternatively, or in addition,specific types of weeds or vegetation (e.g., carrot leaf, cabbage, etc.)may be indicated. The user interface may further include additionalinformation fields 1412 that provide further insights into the imageanalysis algorithms. Such information may be useful for ongoingmonitoring of effectiveness of the system operation, auditing andtraining purposes. Information fields 1412 may provide information suchas a confidence level, coordinates of the object, whether this is a newobject or a previously seen object, and so on. In another example,information fields 1412 can be selectable by a user to input feedback,which itself can be an event or determination by the user. For example,the compute unit 1308 can detect an object, the object being a certainspecies of plant type or plant classification. The object can bedisplayed on user interface 1420 with a determined label 1411 determinedby the compute unit 1308. If the user determines that the classificationof the objected detected by compute unit 1308 is incorrect, the user canselect an alternate classification, among a list of selectableclassifications. In one example, the information fields 1412 areselectable classifications. In another example, the compute unit 1308can make an inference on a given image 1306 by drawing a bounding box(or generating a bounding box) over a portion of the image that includesthe detected object, for example a vegetation object 1310 or 1312. Inone example, the while the object detected may be correct, the boundingbox size is not optimized. In this case, a user can view the specificbounding box generated by the compute unit 1308 and input feedbackand/or give an evaluation of the inference made by the compute unit1308. For example, the user can select an information field 1412 thatindicates that the bounding box is oversized, does not cover a portionof the object in the frame, or a combination thereof. In anotherexample, the user, via the computing device 1320 can interact with theuser interface 1420 and draw the correct bounding box (correct beingthat whatever the human draws is considered the correct bounding box,and can be used as the ground truth bounding box for further training)and the treatment system 1303 and other databases can receive and logthe selection and interaction by the user. In this case, drawingbounding boxes is only one of a plurality of examples of a machinelearning algorithm of a compute unit performing object detection in agiven image frame. Other examples can be performing pixel segmentation,instance classification, segmentation via superpixels, and the userfeedback via computing device 1320 can be that of confirming orcorrecting segmentation, classification, etc. determined by the computeunit.

In one example, only detections below a certain threshold will be sentto a computing device for further analysis, including user feedback andanalysis. In one example all detections of a certain classification orclassifications (no matter the confidence) can be sent, in real time toa user for feedback (whether the user interface is requesting aconfirmation, a correction, an optimization (that is drawing orlabelling a better bounding box despite the machine learning model oralgorithm detecting and labelling the correct pixels associated with thecorrect object identity, but not necessarily the best fit box), or acombination thereof.

FIG. 4B shows another configuration 1402 in which images obtained fromsensors include an indication of a treatment action 1422, as shown bythe image portion 1307, which may be a frame captured by the computeunit 1308 subsequent to the frame used by the compute unit to detectvegetation object 1312. In one example, the treatment action 1422 isdetected by the presence of multiple consecutive images such thatcomparing a frame that does not have pixels associated with a liquid tothat of a frame that does have pixels in the frame associated with aliquid is used to determine the presence of a projectile being emittedtowards a target. In some embodiments, one or more sensors may bedirected towards the target objects to which a treatment is applied. Theone or more sensors may be controlled to capture images during a timeinterval in which the treatment action is expected to occur. Thetreatment action may typically be transient and much smaller in pixelwidth than other agricultural objects. For example, a treatment actionsuch as a pesticide squirt or a laser beam may be approximately 5 to 50pixel wide, and may in 1-10 consecutive images, which typically is lessthan a number of consecutive images in which other agricultural objectssuch as weeds and crop may be present. The treatment action can be thatof a liquid-based projectile detected by the treatment system 1303 orthe result of a liquid-based treatment onto a plant object or a groundregion including a splat (a splat being a liquid projectile impacting asurface), where the splat is detected by determining a change in colorof the ground or a plant. For example, a liquid impacting the groundwill change the color the ground detected by the image capture device.The image capture device can then determine that a color change is theresult of a liquid-based projectile, spray, or spray action impactingthe ground or plant object near the detected color change.

As shown in the configuration 1402 in FIG. 4C, the image portion 6002 inwhich the treatment action is detected may be sent to the computingdevice 1320 for auditing, training and feedback purpose. In thisexample, multiple consecutive images are captured and/or cropped fromlarger images captured and sent to the computing device 1320 foranalysis. The corresponding label 1411 or 1412 may indicate informationabout the detected treatment action, an estimate of its effectiveness(e.g., whether the treatment action actually effected the targetobject), etc. In some embodiments, the user interface may allow a userto play a closed loop video of the consecutive frames to be able tovisually see the occurrence of the treatment action. In oneimplementation, all intended spray events are set to computing device1320 whether the platform has detected a spray. That is, as long as thetreatment system 1303 instructs a component of the treatment system 1303to emit a liquid-projectile, the onsite compute platform 1302, via thecompute unit 1308, can send a series of images, a video, or a similarfile format (such as a graphics interchange format (GIF)), of a portionof a series of images captures that are expected to show a treatmentaction. For example, if the treatment system 1303, sent instructions tofire a liquid projectile at detected plant object, the expected seriesof images or portion of images that captures the treatment action or theprojectile firing and impacting the plant object will be displayed to auser via computing device 1320 (the computing device may be physicallylocated near the supposed treatment action or a connected to the onsitecompute platform 1302 from a cloud or fog network to evaluateperformance of the treatment system 1303 and any related agriculturalfunctions in real time). In one example, if the user using the computingdevice 1320 does not see any treatment action, such as a liquidprojectile or a spray impact on a plant object, the user can thendetermine that a spray action did not occur and select or input suchdetermination into the computing device 1320 for further analysis. Inthis example, the treatment system 1303 may have intended for acomponent to activate and emit a liquid projectile at the target, butdue to internal factors, such as the system losing track of the locationof the object after determining the location of the object in a givenframe, none of the treatment units were finished with their currenttask, so none of the treatment units were selected to fire, even thoughthe treatment system 1303 intended to select a treatment unit. Inanother example, the treatment system 1303, via one of its componenttreatment units did in fact emit a liquid projectile but aimed at thewrong object, or unintended region of interest, such that the imagesassociated with the predicted location of the impact was captured andsent to computing device 1320. Each of these examples are a way for auser to determine that there was a misstep from the moment the treatmentsystem 1303, via a detection (whether valid or not) by compute unit1308, determined and generated an instruction to spray the detectedobject, such as vegetation object 1312, but due to systematic reasons,the spray either does not happen, or completely misses the intendedtarget object (vegetation object 1312) and does not show up at all inthe image portion 6002 (which includes a series of images capturingimage portion 6002 in multiple consecutive frames), to the systemactually emitting a projectile at the intended target. And in some ofthese above examples, a user will see a series of images where there isno spray at all, and the user can select and determine that the sprayeither happened but completely missed the intended target, or spray didnot get emitted from the treatment system 1303 at all.

In another implementation, a spray action can hit the target, miss thetarget (but the spray action itself still captured by the compute unit1308), overspray the target, such that the target is impacted, but so isan unintended nearby region of the target object, for example if anotherplant object is also impacted or if less than a certain amount, thecertain amount being programmable by a user, of splash detected coversthat of the target object. These determinations can all be selections onthe computing device 1320 via the interface 1420. And the selections canbe the information fields 1412. For example, the information field 1412can include selectable inputs representing confirmation of good targetintercept, partial intercept, near miss, complete miss, overspray, bad(wrong) target, confirmation of intercept but towards the wrong target,overspray such that the projectile correctly intercepted the target butalso hit a target that was not supposed to be sprayed or treated. Inanother example, the user can determine that the inference determined bythe compute unit is not accurate, or at least not optimal, and the usercan fix or input the correct or optimized label to the detected object.For example, a user can be displayed with an image patch that includes aweed object and background. The bounding box around the weed portion isnot optimized. In this instance, the user can select a field 1412 whichindicates that the bounding box is incorrect or not optimal.Additionally, the user can, via the user interface 1420, draw thecorrect bounding box and label the object. The user-labelled boundingbox and object classification (the classification of the object insidethe labelled bounding box) can then be used as ground truth for furtheranalysis including metrics analysis of treatment performance by thetreatment system 1303 or for further ML training.

FIG. 4D shows another operational workflow similar to the workflowdescribed with respect to FIG. 3 . Unlike FIG. 3 , in the workflowdepicted in FIG. 4D, the detection of an object for which next action isuncertain triggers an alert on a user interface which prompts a user tophysical go to the location where the object was detected to inspectwhat type of object it was. To assist this workflow, the user interfacemay display a position, e.g., a geographic location marker for theobject. For example, a user may be able to use a software installed onthe computing device 1320 to track images taken by a camera on thecomputing device 1320 to ascertain whether the user is going in thecorrect direction in the field towards the object that is to be manuallyinspected. The user interface may further present a dialogue menu to theuser to allow the user to confirm that the user has located the object,inspected the object and has determined a classification and/or anyattributes of the object. Upon receiving the confirmation on the userinterface, the image may be labeled accordingly, and the ML model may betrained to use the newly acquired information.

In some embodiments, the above-described workflow may be used to performdata gathering in the field based on a certain selection criterion. Forexample, a farm equipment may be deployed in a field to take images,detect object in the images in real-time, and present detected objectsthat meet a certain selection criterion on the user interface.Alternatively, objects that do not meet a certain criterion may bepresented on the user interface. For example, in some embodiments, theimages captured during a field run may be analyzed to detect locationsof weeds. At the end of the field run, an image of the farm may bedisplayed on the user interface to indicate weed locations. In someembodiments, only weeds that are too close to a crop for automatedspraying may be displayed (e.g., within a pre-configured distanceaccording to target accuracy of the spraying mechanism on theagricultural platform). In some embodiments, only vegetables, flowers orfruit that are ripe and ready to be picked may be displayed. In someembodiments, unknown vegetation that the ML detector is not able todiscern and thus needs user feedback may be displayed. These are justsome examples of the criteria by which an ML model may be trained todetect objects and present to a user. In some cases, the ML detectionmay be followed by a certain automated action by the farm-deployedequipment, e.g., treating a detected weed with a pesticide or a laserbeam etc.

FIG. 5 depicts an example flowchart of the workflow for real-timetraining. The general workflow may include obtaining or capturing animage, ingesting the image to an image processing function (1510) forreal-time action and noticing one or more object for which a furtheraction is needed, e.g., because the machine does not understand theobject, or the machine has been instructed to identify and flag suchobjects to a user. At 1520, the objects and/or images resulting from1510 are sent to a user interface of a device such as a portablecomputer or a tablet that may be present in the field. At 1530, eitherthe entire image or an image portion that includes the objects ofinterest may be sent for further action such as labeling and training ofobjects that the machine was not able to identify. As a result of thelabeling and training, the onboard ML model may be updated in real-time(1540). In particular, the number of representative images that areselected for training and labeling (1550) may be significantly fewerthan the total number of images captured (e.g., 0.1 percent or fewer ofall images). In some embodiments, a user may be prompted to perform aspecific action such as identifying the objects in real-world to providea user feedback that may be used for the labeling/training (1550). Insome embodiments, for resource efficiency, the solicited feedback may beonly requested for a subset of images (e.g., every Nth image). In someembodiments, a user confirmation that the requested action was performedmay be solicited for the completion of real-time training and detection.

Multi-Model Training

FIG. 6 shows a scheme 1600 of benefitting from multiple ML models (1620)and performing training of the multiple ML models. Images 1602 may beingested by multiple ML detectors 1604 that are configured to perform MLdetection on multiple images 1606 to produce labeled output images 1608.Labeled image outputs from same images processed through multiple MLmodels may be provided to a verification stage 1610 as a super-labeledimage that includes labeling performed by different ML models for a sameimage.

In some embodiments, a differential comparison of various ML models maybe generated at the verification stage 1610. For example, one attributeof the comparison may be to highlight all objects that are commonlyidentified by multiple ML models as being a same type of object. Anotherattribute of the comparison may be to highlight objects that aredifferently identified by different ML models. For example, the variousML models may include and ML model ML1.0 and another ML model ML1.1 thatis obtained after training ML1.0. In some embodiments, objects that areadditionally identified by ML1.1 compared to ML1.0 may be highlighted.This may advantageously show to a user the way by which training of theML model is progressing. For example, in one example, there may be total500 weeds in an image. ML1.0 may identify 400 weeds, while ML1.1 mayidentify 450 weeds. A user may note that, of the 450 weeds identified byML1.1, some weeds are identical to those identified by ML1.0, while someweeds were missed by ML1.0, and some weeds were missed by ML1.1. Thismay provide additional input to further training. For example, ML1.1 maymiss weeds that present themselves with certain visual attributes suchas longer leaves, or browner color or higher height, etc. Theverification stage 1610 may identify ground truth objects detect in theimages, typically objects that are commonly detected by all MLdetectors, and feed them to a training stage 1612 for further trainingof ML models.

It is noted that the various ML models depicted in FIG. 6 may bedifferent versions of the same ML model as it gets trained and/ordifferent ML models. For example, different ML models may be constructedusing different hyperparameter that map to different object attributes.For example, one ML model may be designed to identify certain types ofweeds, another ML model may be designed to identify certain type ofdesirable vegetation and classify everything else as a weed, and so on.

In some embodiments, a computer-implemented method of sensor inputprocessing, for example using the techniques described in FIG. 6includes receiving sensor input from the sensor; processing the sensorinput by multiple machine learning (ML) algorithms, each using acorresponding ML model for generating labels for objects identified inthe sensor input; combining labels generated by each ML algorithm togenerate a super-imposed labeled sensor input frame; comparing outputsof the ML algorithms to determine similarities or differences; and usingresults of the comparing for improving an operational characteristic ofthe sensor input processing.

In this method, the operation characteristic may be improved byperforming further training of one or more ML models using thesuper-imposed labeled sensor input frame and/or the similarities ofdifferences in outputs of the ML algorithm or by other offline means. Insome embodiments, the ML models include ML models that are based ondifferent sets of hyperparameters. In some embodiments, the ML modelsinclude ML models that are different versions of a same baseline MLmodel that has undergone different training. In some embodiments, thefurther training is performed based on user feedback on thesuper-labeled sensor input frame. In some embodiments, the methodfurther includes generating an ML performance metrics based on thecomparison of outputs. In some embodiments, the method further includespresenting the ML performance metrics on a user interface. Such apresentation may facilitate diagnosis by a human user of theeffectiveness of the agricultural operation performed using the analysisof the sensor inputs.

FIG. 7 shows an example process 1700 of multi-model training. Multipleimages or image portions (patches) may be received by the multiple MLmodels (1720). The multiple ML detectors may be run on the receivedimage portions (1730) using ML models that are different models (e.g.,different hyperparameters) or updated versions of a same baselinemodel). The resulting labeled images may be combined (1740) into asuper-imposed labeled image. Differences and/or similarities of MLdetections may be compared at 1750. The super-imposed labeled images maybe used for further training (1760) with or without additional userfeedback. The super-imposed labeled images may also be used tounderstand the progression of ML updating. A visualization may begenerated, e.g., a heat map showing similarities and differences amongvarious models.

Example Embodiments of Layered/Cascade Processing

FIG. 8A illustrates an example diagram 1800 for ingesting an image,performing various computer vision and machine learning algorithms ontovarious portions or layers of the image to extract and detect featuresof the image.

As disclosed in the present document, multiple image processingalgorithms can be applied in layers to the same or portions of the sameimage. For example, an image 1810 can be acquired by an image capturedevice and loaded onto a local compute unit of agricultural platform.For illustration purposes only, the image 1810 captured can be an imageof a row crop farm having one or more beds 1812 supporting a pluralityof crops, such as carrots, and weeds, and one or more furrows or tracks1814 for a vehicle's wheels to run through as a vehicle passes the row.One or more ML algorithms and computer vision (CV) algorithms in thecompute unit, or accessible by the compute unit in real time via thecloud or edge compute device containing the machine learning algorithmand computer vision algorithm, such as CV algorithm 1820 and an MLalgorithm 1830 can be used to partition the image 1810 into analyzedimages with features extracted, with the goal of accurately detectingobjects in the given image 1810. For example, the first CV algorithm1820 configured to separate beds and furrows can be applied to theanalyze and segment classify the image 1810 with portions of the imagerelated to beds such as partitioned image 1813 with portions of theimage related to furrows such as partitioned background image 1815. Onepurpose of deploying this technique is to that the treatment module doesnot have to run an ML detector on the entire image 1810, but only onportions where object of interest may be. The partitioning of beds andfurrows, as is the partitioning of green and background, are just manyexamples of performing a plurality of computer vision and machinelearning techniques to an image to reduce computation load whilegenerating accurate detections of features in the real world. Next, thesystem will have generated a partitioned image 1816 having pixelsassociated with beds and pixels associated with furrows such as that ofpartitioned image 1813 and partitioned background image 1815. The MLalgorithm 1830, which for example can be a machine learning algorithm todetect plant objects of interest, such as crop plants and variousspecies of weeds, can be implemented to further analyze the image 1810or the partitioned image 1816, and only the portion of the image 1810that is partitioned image 1813, and not the partitioned image 1815. Thiswould allow the ML detector or machine learning algorithm 1830 toanalyze fewer pixels or tiles of pixels, and reduce the load on thesystem, while the system having a high probability that the machinelearning detector is scanning the most important areas of the image1810. In this example, the detector would run detections on only aportion of the partitioned image 1816, for example a portion of thepartitioned image 1813, such as a patch 1832 of the partitioned image1813. The treatment system can then draw bounding boxes, semanticallyclassify, or perform various machine learning methods deployed bymachine learning algorithm 1830, for example detect objects and drawbounding boxes, and generated a machine labelled or machine detectedimage 1842, which is a labeled of image of a portion of the originalintake image 1810. The agricultural observation and treatment system canthen use those detections to determine which detections are targetobjects to treat, target the objects in the real world, track thedetected objects in subsequent frames, and perform a treatment action tothe detected object in the real world. Additionally, using multiplelayers of computer vision and machine learning algorithms to optimizethe computing load onto a compute unit can be performed to improveVisual Simultaneous Localization and Mapping VSLAM. For example,vegetation segmentation can be performed to detect green objects. In theVSLAM pipeline for matching keypoints from frame to subsequent frames bythe same sensor, the compute module or compute unit associated with thesensors receiving the images, can determine that points associated withgreen objects are real objects in the world that are stationary and canbe tracked via VSLAM by sensors and compute units of each componenttreatment module for local pose estimation. This would allow the VSLAMalgorithm to analyze keypoints, keypoints in this case being pointsrelated to comers or contours or edges of green objects, with higherconfidence that the keypoints generated and analyzed are higher qualitythan that of arbitrary salient points, compared to that of knownobjects, such as objects corresponding to green pixels, since the systemwill know beforehand that green pixels are of vegetation, which arephysical objects in space that are stationary and are of similar sizeand topography as that of target objects for treatment that will betracked.

FIG. 8B shows another example embodiment which shows processing ofagricultural sensor input at different levels of detail. As depicted inthe flow of process from left to right, initially the treatment system1303, on the agricultural vehicle 1304, ingests an image. The treatmentsystem 1303 may be configured with a compute module 1852 configured toimplement a first ML model that performs object detection via a first MLalgorithm, including that of image processing algorithm 1318, forexample. The ML model is trained to detect objects on a greater level ofdetail than a subsequent processing stage. The image resolution handledby the first ML may be from a plurality of resolutions ingested such as4k, 1000×1000, 1000×640, 768×768, etc. After the first ML (ML1) detectsregions of interest where objects are detected, these regions may becropped (which itself is a small image) and may be fed into a secondprocessing stage in the compute module 1852. Here, a second ML model mayperform instance classification, e.g., given an image, determine a typeof object that the image represents. In some example, the imagesingested by the treatment system 1303 can be native to 50+ Megapixels,100+ Megapixels, 200+ Megapixels of which an ML detector can beconfigured to perform object detection on the image, either at thenative resolution, a downsampled resolution, or a portion of the image.

In some embodiments, the second ML model 2 (ML2) may be a betterclassifier of objects. For example, ML model 2 may be trained to operateon a 30×30 pixel region to provide a binary result about whether thepixel region belongs to a particular object such as a weed or a crop. Insome embodiments, the ML model 2 may be trained to provide a result thatthe detected region includes a weed or a crop or soil or other pattern.In some embodiments, where the first image corresponds to a 3 ft×2 ftarea of soil at 8K resolution, the 30×30 pixel image may correspondapproximately to a 4 mm×4 mm real world dimension. In another example,the first ML model can be implemented on a downsampled native image toperform object detection, and the second ML model can be implemented onthe native image patch detected by the first ML model, such that thearea of the of the image analyzed by the first model of the downsampledimage is the same area as that of the native image used by ML model 2 toperform instance classification.

In some embodiments, both ML1 and ML2 (which are ML models, MLalgorithms, ML algorithms configured to run ML models, ML algorithmsconfigured to output ML models, or a combination thereof) may be trainedon same underlying resolution, except that the training image size usedmay be different for the two models (e.g., 768×768 for ML1 and 30×30pixels form ML2). In some embodiments, ML1 may be trained using areduced detailed version of the original image—e.g., a downsampledresolution version or only one color or a reduced number of bits perpixel version, compared to ML2, which may be trained on a smaller sizedfull resolution image portion.

In some embodiments, the agricultural objects of interest may includemultiple different object types such as different types of crops,different types of fruits, flowers, weeds or soil in differentconditions. Upon execution, ML1 may provide an indication for type ofobject or identify a specific type of object and provide a decisionconfidence level.

In the subsequent processing stage, ML2 may provide an outcome of itsanalysis. The results from ML1 and ML2 may be further used foragricultural operations using one or more of the following manners:

A. If there is disagreement between the object identified in ML2 fromthe object detected in ML1, then the objects or the corresponding imageregions for which such a disagreement occurs may be uploaded via anetwork connection 1854. In some embodiments, the image region whereconflicting results were found may be corrected in real-time by a humanoperator or by another ML model that is trained to resolve suchconflicts. In some embodiments, the image portion may be used forfurther training of ML1 and/or ML2 models.

b. In some embodiments, every weed detection or every object of aparticular type (e.g., weed, soil, stage of growth/state of plant (e.g.fruit)), portion of plant that has its own classification, landmark,etc.) may be uploaded for further review/training.

c. In some embodiments, objects detected by meeting certain detectionthresholds by ML1 or ML2 may be uploaded for further review/training.

In some embodiments, if ML1 and ML2 provide object detections that areconsistent with each other, such objects may be identified as targetobjects and may be selected for further agricultural operationsaccording to one or more of the following criteria.

1. In one example implementation, ML1 may detect a specific object fortreatment (e.g., a weed); while ML2 does not detect this to be an objectfor treatment, then such as object may not be treated and may be usedfor further training/resolution. For example, ML1 receives an image witha plurality of agricultural objects, patterns, landmark, and backgroundfeatures in the image. The ML1 detects a first agricultural object of afirst classification in a portion of the image. In one example, thefirst classification is a crop. In one example, the first classificationcan be a number of different species of weed, each having its ownclassification. In one example, the first classification can be state ofa plant (stage of growth or phenological stage with distinct physicalfeatures). In one example, ML2 is configured to perform instanceclassification among crop species, weed species, and soil patterns. ML2analyzes the same portion of image labeled by ML1 as a firstagricultural object of a first classification and classifies the image.If the classification of the image is the same as the firstclassification determined by ML1, then both ML models are in agreement,and a further processing can be performed, such as tracking the objectand performing a treatment action. If the classifications are indisagreement, in one example, the specific patch or portion of the imagecan be sent to a user interface via network connection 1854 so that auser can verify the correct classification, input the user's selectionor determination of the actual classification, and use the user'sdetermination for further processing including analyzing metrics orfurther ML training. For example, a treatment system 1303 can ingest a50 Megapixel image. ML1 can run a detector that analyzes a tile of768×768 of the 50 Megapixel image. ML1 can detect that a portion ofpixels (by a bounding box) of the 768×768 tile of the 50 Megapixel imageincludes an agricultural object, and determines that the agriculturalobject in the portion of the 768×768 tile is a weed, for example object1858 being determined that it is a weed. ML2 can run a detector thatanalyzes the same portion of the 768×768 tile of the 50 Megapixel image,or analyzing the 50 Megapixel image directly by running a tile of thesame size and same location as that of the portion of the 768×768 tile,or ML2 ingests full images (the full image in this case being an imagethat is the same portion of the 768×768 tile of the 50 Megapixel image),and determine whether the image or portion of the image is that of aweed or another classification. If ML2 determines that theclassification is weed, then the treatment system 1303 can be configuredto perform a treatment action on object 1858. If ML2 determines that theclassification is a pattern of soil (for example the cracks and contoursof a soil pattern looks similar to that of the pattern of a plant), thena treatment action will not happen and the portion of image (as an imagepatch) can be sent, in real time, to a compute device 1320 for review.In one example, ML1 can detect an object with high certainty (above 70%confidence). If ML2 classifies the object detected in the ML1 detector,than a user can analyze the discrepancy between the ML2 classifier andthe detected result of the ML detector. In one example, any detectionsabove a certain confidence (e.g. over 70% or over 90%), the detectiondoes not get sent to a user or further processing offline for analysissince the confidence is high.

2. 1. In one example implementation, ML1 may detect a specific objectfor non-treatment (e.g., a crop); while ML2 detects this to be an objectfor treatment, then such as object may not be treated and may be usedfor further training/resolution.

In some embodiments, an object (e.g., 1858) may be detected as an objectfor treatment by both ML1 and ML2. In this case, the treatment mechanismmay be activated to treat this object.

In some embodiments, an object (e.g., 1856) may be detected as an objectfor non-treatment by both ML1 and ML2. In this case, the treatmentmechanism may be refrained from activation to treat this object.

FIG. 9 is a flowchart 1900 of a method of processing of agriculturalimages. At 1910, images of an agricultural environment are obtained. Forexample, onboard cameras of an agricultural platform may be used tocapture images and store them to a database. A computer platform maythen obtain the images from the database. Multiple image processingschemes may be applied to the images as follows. At 1920, a first imageprocessing scheme may be applied to the images. The resulting outputsmay provide agricultural images from which certain pixels or pixelregions may be eliminated from further processing (1930). For example,certain pixels or pixel regions may be identified as background (e.g.,ground, rocks, or other object that are not of further interest) and maynot be further processed. At 1940, one or more additional imageprocessing schemes may be applied to the simplified, reduced imagesoutput from the first image processing scheme. For example, the firstimage processing scheme may be a CV algorithm, while the second imageprocessing scheme may be an ML detection algorithm. In some embodiments,images that have undergone multiple detections through multiple imageprocessing schemes may be provided to a machine assisted human labelingoperation and/or for selection for further training of ML models used inthe process 1900. In some embodiments, the resulting user assistance andtraining may be used for further improvements of ML performance metrics.For example, an optional visualization may be provided to allow a userto track improvements that are performed through the application ofmultiple image processing schemes. In the process 1900, the multi-schemeoperation may rely on various attributes of pixels such as dividingimages into background/foreground, using dept of pixels to identifywhich pixels represent ground and which pixels may represent vegetation,using pixel colors to provide bounding boxes that contain detectedobjects, and so on. FIG. 8 and associated description provides someexamples of embodiments of the process 1900.

A method of processing agricultural images according to the foregoingtechniques includes comparing object detections performed by multipleimage processing schemes to determine a set of ground truth images fromwhich at least one machine learning (ML) model used by at least one MLalgorithm included in the multiple image processing schemes is trained,wherein the multiple image processing schemes include two or more of;(a) an image processing scheme that includes a cascade of multiple MLalgorithms; (b) an image processing scheme that includes imageannotation based on user feedback; (c) an image processing scheme thatincludes a cascade of an ML algorithm or a computer vision (CV)algorithm and a user feedback. In some embodiments, (c) comprises firstannotating images using the ML algorithm or the CV algorithm, followedby the user feedback. In some embodiments, (c) comprises firstannotating images using the user feedback, followed by annotating imagesusing the ML algorithm or the CV algorithm. In some embodiments, the setof ground truth images includes a first set of images analyzed by themultiple image processing schemes and a second set of images that areintervening images where results of the first set of images arepropagated.

Multi-Model Training and Refinement

FIG. 10 depicts an example scheme 2000 that highlights the trainingaspect, among other things, of an agricultural image processing system.A database 2002 may store images of an agricultural environment. Some,or all, of these images (2004) may be provided to multiple ML detectionschemes in the data labelling pipeline 2050. For example, in imageprocessing scheme 2010, a user may annotate images using a userinterface 2012 on which user feedback is provided to generated labeledimages 2014. The labeled images 2014 can be used recognized as a groupof ground truth images or training set or labelled images 2052 forfurther training. In one example, in image processing scheme 2030, thelabelled images 2014, from image processing scheme 2010, can be ingestedand processed via a propagation algorithm 2042 configured to propagatethe labels of labelled image 2014 to subsequent or prior captured imageframes near the captured image that was used to create labelled image2014. The propagated images can also be recognized as a group of groundtruth images or training set or labelled images 2052 for furthertraining. In another example, in image processing scheme 2020, an MLalgorithm or model or CV algorithm may ingest input images and annotateusing an image processing scheme 2022 to produce annotated images 2024.The image processing scheme 2022 may be an ML scheme or a CV scheme. Theannotated images may be input to an image processing scheme 2032 inwhich user feedback may be used to further improve object detectionincluding procuring labelled images 2034. In one example, the labelledor annotated images 2024 can be used directly as ground truth images orlabelled images 2052 (using pseudo labelling, that is model label dataand incorporating the model annotations into a training set). In oneexample, the annotated images 2024 can be corrected, confirmed,verified, optimized, by a user via user interface to generate labelledimages 2034. The labelled images 2034 can be recognized as a group ofground truth images or training set or labelled images 2052 for furthertraining. An image processing scheme may ingest images, annotated imagesoutput from an image processing scheme and use a further ML algorithm toperform object detection. Resulting annotated images of all the variousways of image annotation and ML detection of objects may be verified orcompared. Here, a determination may be made regarding ground truthimages based on the comparison to determine which images should belabeled and selected for further training of ML algorithms used in thescheme 2000.

A first set of preferred solutions implemented by some embodimentsincludes:

1. A computer-implemented method of processing images (e.g., method 2100depicted in FIG. 11 ), comprising: capturing (2110), using one or morecameras deployed on an agricultural platform images of an agriculturalenvironment in which the agricultural platform is operating; annotating(2120) the images using a first machine learning (ML) algorithm, whereineach annotation includes a confidence number associated with the eachannotation, and wherein the first ML algorithm is trained based on asecond set of images captured by the one or more cameras; generating(2130) a subset of the images as a training set for further training ofthe first ML algorithm, wherein the subset of the images in the trainingset is generated partially based on a user feedback; and transmitting(2140) the subset of images to a training platform that generates asecond ML model by training of the first ML algorithm using the subsetof image data.

2. The method of solution 1, wherein the annotation comprises addingbounding boxes to objects detected in the images.

3. The method of claim 1, wherein the annotation comprises adding asyntax element or a pixel to objects detected in the images.

4. The method of solution 1, wherein the training set is generated by:determining that a particular confidence number associated with aportion of a particular image is below a threshold; providing, based onthe determining, the portion of the particular image on a user interfaceof a user device that is different from the agricultural platform;receiving, after the providing, an input on the user interface, whereinthe input indicates a mode of further processing the particular image;and selectively including, based on the input on the user interface, theparticular image in the training set.

5. The method of solution 1, wherein the training set is generated by:determining that a particular confidence number associated with aportion of a particular image is below a threshold; providing, based onthe determining, a real-world location indication for the portion of theparticular image; receiving, after the providing, an input on the userinterface, wherein the input indicates a mode of further processing theparticular image; and selectively including, based on the input on theuser interface, the particular image in the training set.

6. The method of solution 1, wherein the training set is generated asthe subset of the image data using active learning in which the subsetof images is generated using an exclusion criterion that excludes imagessimilar to images that were previously user-verified, or an inclusioncriterion that includes images having an aggregate low confidence level.

A second set of preferred solutions implemented by some embodimentsincludes:

1. A machine learning method (e.g., method 2200 depicted in FIG. 12 ),comprising: annotating (2210) agricultural images using N machinelearning (ML) models, where N is an integer greater than 1; presenting(2220) results of annotation by the N ML models on a user interface;receiving (2230) a user feedback for the results of annotation by the NML models; and generating (2240) a set of training images based on theuser feedback.

2. The method of solution 1, wherein the N ML models correspond todifferent versions of a same ML model.

3. The method of solution 1, wherein the N ML models implement differentML models.

4. The method of solution 1, wherein the results of annotation arepresented by highlighting conflicting annotations.

5. The method of solution 1, wherein the results of annotation arepresented by presenting uncertain annotations.

6. The method of solution 1, further including, using the trainingimages to further train at least some of the N ML models.

7. The method of solution 1, further including, using the trainingimages to generate an (N+1)st ML model.

A third set of preferred solutions implemented by some embodimentsincludes:

1. A method of processing image data (e.g., method 2300 depicted in FIG.13 ), comprising: capturing (2310) real-world images of an agriculturalenvironment, wherein each real-world image comprises a plurality ofpixels, wherein a corresponding depth value and corresponding one ormore color values are associated with each pixel of the plurality ofpixels; and detecting (2320) one or more agricultural objects ofinterest in the real-world images by applying multiple image processingschemes to the real-world images, wherein the multiple image processingschemes include a depth value-based image processing scheme and a colorvalue-based image processing scheme, wherein at least one of themultiple image processing schemes use a machine learning (ML) model.

2. The method of solution 1, wherein the multiple image processingschemes are applied in an order comprising first applying the depthvalue-based image processing scheme in which a subset of pixels whosedepth values do not meet a threshold are eliminated from furtherprocessing.

3. The method of solution 2, wherein the depth value-based imageprocessing scheme uses the ML model.

4. The method of solution 1, wherein the multiple image processingschemes are applied in an order comprising first applying the colorvalue-based image processing scheme in which a subset of pixels whosecolor values do not meet a criterion are eliminated from furtherprocessing.

5. The method of solution 4, wherein the color value-based imageprocessing scheme uses the ML model.

6. The method of solution 1, further including: selectively performing atreatment action on the one or more agricultural objects of interestbased on a rule, wherein the rule specifies to apply a treatment to afirst agricultural object that is detected to be a weed, or refrainingfrom applying a treatment to a second agricultural object that is notdetected to be an undesirable object.

7. The method of solution 6, wherein the treatment comprises spraying apesticide or shooting a laser beam.

In some embodiments, the above-described method solutions may beimplemented by one or more processors on a computer platform that may bedeployed in an agricultural environment.

Example Embodiments for ML Model Training

In some embodiments, field operation of a moving platform (e. g., anagricultural vehicle or an aerial vehicle, etc.) may generate a largeamount of data during the operation. One of the challenges faced inmanaging such a large amount of data is to be able to filter, or cull,data (e.g., sensor input or image data) that is not useful for anintended purpose and data that is useful for an intended purpose such asidentification and treatment of agricultural objects such as fruits,vegetables, weeds, and so on. In particular, during real-time operationin a field environment, the algorithm used for processing images, e.g.,a ML model used for ML based object identification, may remainunchanged. However, the data gathered during this operation, or datagathered in other locations, may be used to further train the ML modelfor future use. As further described herein, some embodiments may usecertain rules or criteria to prune or reduce the amount of datacollected to a smaller subset that is used for such a subsequenttraining. Various rules and criteria for data pruning may be possible.For example, in some cases, the collected data may be continuouslyprocessed in real-time, but only images collected every so-often may beused for subsequent training.

For example, a predetermined period may be used to sample images forfurther training. The predetermined period may be, e.g., once every 5seconds or once every 10 seconds. In some embodiments, the predeterminedperiod may be a function of how many times have images of a particularagricultural environment been captured and analyzed previously. Forexample, when making a first run in location (e.g., a farm) because theML model may be relatively untrained for this location, a greater numberof images may be captured. In other words, the predetermined period ofcapture of images for subsequent training may be below a threshold(e.g., one image every second). During each subsequent run in thislocation, fewer and fewer images may be retained for subsequent trainingof the ML model. For example, the predetermined period may be adjustedfrom one second to 5 or 10 seconds, or greater.

Another rule or criteria may be based on distance traversed by themoving platform on which the sensors capturing the images are located.For example, images captured every 5 to 10 feet may be collected forsubsequent training. Similar to the predetermined time period, thepredetermined spatial distance at which images are collected may alsodepend on how many data collection/processing runs have previously beenmade at a particular location. Initially, data may be retained morefrequently for training (e.g., once every 2 feet). Subsequentcollections may reduce data collections to greater distances, e.g., onceevery 5 or 10 feet or greater. The distance may also depend on how manyobjects of interest are being identified. For example, fewer objectsbeing detected in a certain region may result in an increase in thedistance at which images captured are used for further training.Conversely, if a large number of objects of interest are detected in acertain region, then a greater number of frames may be used forsubsequent display and/or training.

Other rules used for data pruning may include a combination of one ormore of: use the image data for every object of interested that isdetected above a certain confidence level threshold (or alternatively,below a certain confidence level), image data corresponding to eachobject of interest for which a particular subsequent action is taken.e.g., a treatment action is performed, and so on.

FIG. 17 shows an example overview of the data culling process 17000 andFIGS. 14-16 provide additional implementation examples of this process.As show in FIG. 17 , right hand side, one feature of this process is areduction in the amount of data that is processed by every subsequentdata processing operation. In one advantageous aspect, such anarrangement allowed for allocation of commensurate computationalresources. In another advantageous aspect, by reducing the amount ofdata stored and processed, real-time responsiveness of the image dataprocessing and decision making in a field is possible even in caseswhere computational resources may be limited or battery operated.

As depicted in FIG. 17 , a moving platform 17004 may be configured withone or more sensors 17002 such as various sensors (cameras, lidar, depthsensors, etc.) described herein. The sensor input from these sensors maybe processed through an initial processing stage 17006. The sensor inputmay be organized, for example, as a sequence of images. The initialprocessing stage 17006 may identify portions of the images as containingone or more objects of interest. These portions may be, for example,rectangular patches (see, e.g., FIG. 3 ) that include the one or moreobjects of interest.

The initial processing stage 17006 may provide the image portions to theculling or filtering stage 17016 in which a rule may be applied to thedata to determine the subset of data on which further processing is tobe performed. As described with reference to FIGS. 14 to 16 , thefurther processing may be displaying the data to a human user to getfeedback from the user and/or using the data for generating a trainingset that is used for further training of the initial processing stage17006. For example, the further processing may display subsets of imagesto a human reviewer present in the field near the moving platform 17004using a communication connection 17008. The communication connection17008 may be a wired connection such as Ethernet or a short ragewireless connection or a wireless local area network connection. Forexample, a farmer 17012 operating a tractor equipped with a computerwhich an ML algorithm is running may be able to, in real-time, viewvarious agricultural objects detected by the ML algorithm, and provideinstantaneous feedback regarding accuracy of object identification. Thefarmer's feedback may be collected and used for subsequent training ofthe ML algorithm.

As depicted in FIG. 17 , alternatively, or in addition, the furtherprocessing, as depicted in diagram 2700, may include providing a subsetof image data to a viewing device 2714 to a human reviewer 2718 over anetwork 2710 or providing a subset of image data to a viewing device2708 over a network to a human review 2712. A typical use case may bethat the human reviewer 2718 is located in an off-site location and maybe monitoring, either in real-time or non-real-time, performance of theinitial processing stage 2706 in the field. The human reviewer 2718 mayprovide feedback about accuracy of the detected one or more objects ofinterest. In one example, the human reviewer 2712 may be present in thefield where the sensor input is captured and may thus be aware of theambient conditions such as ambient light, the human reviewer 2718 may beoff-site, not directly exposed to the same ambient conditions and may infact be situated to review data collected from different field locationsand different moving platforms 2704 and component treatment modules2702. In one example, a human reviewer 2712 may be observing thetreatment module 2702 perform in real time in an agriculturalenvironment. The subset of image data can be further processed in imageprocessing stage 2716 such that detections of a certain object typeabove or below a certain criteria (e.g confidence threshold, number ofdetections in a given frame, etc.) can be sent to a review 2712 or 2718.In one example, if a series of incorrect detections are being displayedon reviewer 2712 or reviewer 2718, a human can cease operations. Inanother example, if a reviewer, for example reviewer 2718, in real time,is seeing that at the image processing stage 2716, the treatment module2702 is not sending any subset of data via the network 2710 to thereviewer 2718, the reviewer 2718 may infer that there are no plantobjects, or target objects, or both, in the vicinity of the movingplatform 2704 operating the particular region the moving platform 2704is operating on at the moment. And in this example, the reviewer 2718can indicate for the moving platform 2704 to move faster (eitherautonomously or signal an operator of the moving platform 2704 to speedup). Finally, the feedback from the reviewers may be used to providefurther training and updates of the ML model used in the initialprocessing, as indicated by the downward feedback line.

It will be appreciated that the above-disclosed arrangement may operatelike a “flywheel” that grabs a large amount of data from sensors, and inreal-time, produces smaller amount of data that contains objects ofinterest and presents this information for further processing. Dependingon operational conditions, the amount of data reduction in such a schememay be by a factor of 100 or over a million times less data that is usedfor subsequent training compared to the volume of data captured by thesensors.

In some embodiments, a computer-implemented method of sensor inputprocessing (e.g., method 2400 depicted in FIG. 14 ) may includeperforming (2410), using a processor onboard a vehicle, a machinelearning (ML) processing on sensor input from sensors onboard thevehicle; identifying (2420), according to a rule, a subset of dataresulting from the ML processing; and generating (2430) the subset ofdata for modifying the ML processing for a subsequent use.

In some embodiments, the modifying the ML processing for the subsequentuse comprises training an ML model based on a training set generatedfrom the subset of data.

In some embodiments, the training set is generated by: determining thata particular confidence number associated with a portion of a particularimage is below a threshold, providing, based on the determining, theportion of the particular image on a user interface of a user devicethat is different from the vehicle, receiving, after the providing, aninput on the user interface, wherein the input indicates a mode offurther processing the particular image; and selectively including,based on the input on the user interface, the particular image in thetraining set.

In some embodiments, the training set is generated by: determining thata particular confidence number associated with a portion of a particularimage is below a threshold; providing, based on the determining, areal-world location indication for the portion of the particular image;receiving, after the providing, an input on the user interface, whereinthe input indicates a mode of further processing the particular image;and selectively including, based on the input on the user interface, theparticular image in the training set.

In some embodiments, the training set is generated as a subset of thesensor input using active learning in which the subset of the sensorinput is generated using an exclusion criterion that excludes sensorinput images similar to other sensor input images that were previouslyuser-verified, or an inclusion criterion that includes sensor inputimages having an aggregate low confidence level. In one example, theinclusion criterion can be input images where ML detector detects acertain type of objects close to each other in the real world.

In some embodiments, in the rule defines the subset of data to includeportions of images obtained from the sensor input at a predeterminedtime interval.

In some embodiments, the rule defines the subset of data to includeevery Nth image of the sensor input, where N is a positive integer. Forexample, based on operational conditions such as resource availabilityor density of agricultural objects, the number N may be pre-defined orselected at run-time. For example, fewer images (e.g., greater value ofN) may be processed in a low-density agricultural environment. In someembodiments, wherein the rule defines the subset of data to includeportions of images obtained from the sensor input after a predeterminedphysical movement of the vehicle.

In some embodiments, the rule defines the subset of data to includeportions of images according to a detection criterion upon the MLprocessing.

In some embodiments, the generating the subset of data comprises addingbounding boxes or syntax elements or pixels to objects that are detectedin images obtained from the sensor input.

In some embodiments, the method further includes identifying one or moretarget objects based on the ML processing, and performing a treatmentaction on the one or more target object by controlling a treatmentmechanism mounted on the vehicle.

In some embodiments, the vehicle is an agricultural vehicle operating inan agricultural environment, and wherein the user interface is displayedon a device that is at a different geographic location than theagricultural environment.

In some embodiments, a treatment module configured to perform atreatment action is disposed on the vehicle and wherein the rule definesthe subset as a portion of the sensor input that is expected to includethe treatment action on a target object.

In some embodiments, upon detecting absence of the treatment action inthe portion of the sensor input that is expected to include thetreatment action on the target object, information is sent to the userinterface to enable a human intervention.

Examples of Real-Time User Control

1. A processor-implemented method (e.g., method 2500 depicted in FIG. 15), comprising: performing (2510), using a processor onboard a vehicle, amachine learning (ML) processing on sensor input from sensors onboardthe vehicle; identifying (2520), according to a rule, a subset of dataresulting from the ML processing; and generating and displaying (2530),in real-time, the subset of data to a user interface, thereby enabling auser interaction with the subset of data.

In some embodiments, the vehicle is an agricultural vehicle operating inan agricultural environment, and wherein the user interface is of adisplay device mounted on the vehicle or of a portable electronic devicecontrolled by a field operator in the agricultural environment.

In some embodiments, the rule specifies that the subset of dataresulting from the ML processing comprises portions of images obtainedfrom the sensor input that, wherein the portions of images include oneor more target objects subject to a treatment by a treatment mechanismunder control of the processor.

In some embodiments, the rule defines the subset of data to includeportions of images obtained from the sensor input at a predeterminedtime interval.

In some embodiments, the rule defines the subset of data to includeportions of images obtained from the sensor input after a predeterminedphysical movement of the vehicle.

In some embodiments, the rule defines the subset of data to includeportions of images according to a detection criterion upon the MLprocessing.

In some embodiments, the vehicle is an agricultural vehicle operating inan agricultural environment, and the user interface is displayed on adevice that is at a different geographic location than the agriculturalenvironment. For example, as depicted in FIG. 4D, the computing deviceon which the user interface is displayed may be communicating with thevehicle through a cloud network and may be located in a geographiclocation far away from a farm.

In some embodiments, a treatment module configured to perform atreatment action is disposed on the vehicle. The rule may define thesubset as a portion of the sensor input that is expected to include thetreatment action on a target object. For example, the treatment modulemay be controlled to perform a treatment action on a particular targetobject and therefore the sensors may be controlled to capture imagesnear the target object.

In some embodiments, upon detecting absence of the treatment action inthe portion of the sensor input that is expected to include thetreatment action on the target object, information is sent to the userinterface to enable a human intervention. This information may include,for example, location of the target object where a treatment actionfailed, such that a human may be able to physically walk over to thislocation to find out why treatment was not performed.

Examples of Selective Data Culling for Training

In some embodiments, a processor-implemented method (e.g., method 2600depicted in FIG. 16 ) may include performing (2610), using a processoronboard a vehicle, a Machine Learning (ML) processing on sensor inputfrom sensors onboard the vehicle; identifying (2620), according to arule, a subset of data resulting from the ML processing; and providing(2630) the subset of data to a user interface.

In some embodiments, the sensor input comprises image data, wherein theML processing comprises annotating patches of the image data usingannotations associated with each patch, wherein each annotation includesa confidence number associated with a confidence level with which the MLprocessing has identified an object of interest in a corresponding patchof image data.

3 In some embodiments, the annotation comprises adding bounding boxes toobjects detected in the image data.

In some embodiments, the annotation comprises adding a syntax element ora pixel to objects detected in the images. In some embodiments, theannotation comprises a yes/no label (e.g., true/false selection for aspecific object type) or a selection from a pre-determined set ofselections (e.g., this is a weed, this is a crop, this is soil, this isnothing, etc.).

In some embodiments, further including receiving user feedback on theuser interface for the subset of data; generating, based on the userfeedback, a training set for further training of an ML model used by theML processing; and training the ML model using the training set forfuture use.

In some embodiments, the training set is generated by: determining thata particular confidence number associated with a portion of a particularimage is below a threshold, providing, based on the determining, theportion of the particular image on the user interface of a user devicethat is at a different geographic location than a location of thevehicle; receiving, after the providing, an input on the user interface,wherein the input indicates a mode of further processing the particularimage; and selectively including, based on the input on the userinterface, the particular image in the training set.

In some embodiments, the training set is generated by: determining thata particular confidence number associated with a portion of a particularimage is below a threshold; providing, based on the determining, areal-world location indication for the portion of the particular imageon the user interface; receiving, after the providing, an input on theuser interface, wherein the input indicates a mode of further processingthe particular image; and selectively including, based on the input onthe user interface, the particular image in the training set.

In some embodiments, the training set is generated as the subset of theimage data using active learning in which the subset of images isgenerated using an exclusion criterion that excludes images similar toimages that were previously user-verified, or an inclusion criterionthat includes images having an aggregate low confidence level.

In some embodiments, the subset of data includes non-sequential framesof the sensor input and the training set includes intermediate framesbetween the non-sequential frames, wherein the intermediate frames areused as the training set by propagating user feedback received for thenon-sequential frames. For example, every Nth frame may be processed bythe ML algorithm or user feedback, and a tracking algorithm may trackthe decisions for the intermediate frames between the N frames. Here, Nmay be a positive integer with a typical value between 1 and 100.

In some embodiments, the vehicle is operating in an agriculturalenvironment and wherein the sensors comprise one or more of a depthsensor, a light detection and ranging (LiDAR) sensor, or an infraredcamera.

In some embodiments, the vehicle is operating in an agriculturalenvironment, and wherein the user interface is located on a user devicein the agricultural environment.

In some embodiments, the vehicle is operating in an agriculturalenvironment, and wherein the user interface is located on a user devicethat is not in the agricultural environment.

Examples of Image Data Handling

In some embodiments, each treatment module may be configured with one ormore cameras that are configured to capture images of the surroundingagricultural area. A camera may be configured to capture images at afirst resolution and a first frame rate and transfer the images to theprocessing circuitry onboard the treatment system. In some embodiments,the first resolution and/or the first frame rate may be the highestresolution or frame rate that the entire system is designed to handle.For example, in some embodiments, the cameras may be configured tocapture the agricultural images at 8K resolution and 2400 frames persecond.

In some embodiments, the image data may be handled in the followingoperations:

Operation 1: An image may be captured at a first resolution.

Operation 2: The captured image is stored in a memory at the capturedresolution.

Operation 3: N downsampled versions of the image are generated. N is apositive integer. The downsampling ratio may be in multiple powers of 2,e.g., 2, 4, 8 or greater in both X and Y directions.

Operation 4: A downsampled version is made available to a process thatidentifies one or more areas of interest within the downsampled version.Compared to the full resolution image at the first resolution, thedownsampled version may include 1/64th or 1/128th the amount of pixeldata.

Operation 5: A request to fetch additional data from the originalcaptured image based on the processing of the downsampled image data isreceived. The request may identify an (x, y) location of a top-leftpixel location in the downsampled image data and a shape or otherdescription of the region of interest.

Operation 6: An inverse mapping of the requested image data may beperformed and the corresponding data from the original image may be sentacross a memory bus to an image processing section for furtherprocessing. The remaining image data may be deleted from the memory ormarked for reuse.

Operation 7: At some time during the operations 5, 6, a next image atthe first resolution may be captured and stored in the memory.

Examples of Image Processing at Different Levels of Details

As previously described with reference to FIG. 17 , upon reception ofimages at the first frame rate and first resolution, a downsamplingfunction of the processing circuitry may generate a downsampled versionof the images. In some embodiments, the downsampling may be performedusing a pre-defined set of filters and a pre-defined set of downsamplingfactors. In some embodiments, a specific downsampling filter and/or adownsampling factor may be selected and provided to the downsamplingfunction based on operational decisions as disclosed herein. Briefly,these decisions may include ambient light, a type of crop being raisedin the agricultural farm and so on.

In some embodiments, the multi-resolution image processing may compriseseparating the captured image into different colors. For example, animage captured with 24-bit pixel value representation may be separatedinto lower resolution images by separating into three colors, eachhaving 8-bit pixels.

In some embodiments, the captured frame detail (e.g., the resolution)may be greater than the image detail used for identifying objects withinthe frame.

In some embodiments, multiple machine learning (ML) models, each trainedand configured to operate at a different resolution, may be executed onthe treatment system. For example, a first ML algorithm may be used on areduced detail image.

Examples of Using Multiple ML Models at Multiple Levels of Details

In one example, ML model 1 may be trained to detect certain agriculturalobjects (e.g., any of weeds, crop or soil) at a first resolution. MLmodel 1 may detects crop with high certainty (e.g., above a threshold1). If the detection falls below threshold 1, then the detection may bereported for further processing (see. e.g., FIG. 8B). The same imageregion may be sent as an input to ML model 2. In different embodiments,ML model 2 may be configured to operate at different levels of details,e.g., lower resolution, or lower pixel height/width or a different layerof image (e.g., a single color or different number of bits per pixel).The ML model 2 may be trained to classify the object in the image regionas a soil object, a crop object or a weed object. Depending on theclassification of the type of object, the result and the image regionmay be uploaded for further processing. For example, all findings of“soil” object type may be uploaded in one embodiment, while all findingsof a “weed” object type may be uploaded in another embodiment. Ingeneral, which type of object detections to upload may be programmedaccording to operational criteria. The uploaded image region and MLclassifications may be used for further training, visual classificationby a human or an independent ML mode (different from ML1 and ML2).

Soil Detection Embodiments

In some embodiments, the ML models used herein may be used for detectinga certain type of soil and/or to gauge an amount of moisture in thesoil. For example, ML models may be trained to identify various soiltypes and soil textures based on amount of water in the soil. In someembodiments, soil may be treated in a similar manner as otheragricultural objects (e.g., train ML models at a certain level of imagedetail and/or use human feedback during training). In some embodiments,various soil textures may be treated separately, resulting inidentification of multiple types of soil objects, e.g., “wet soil.”“tilled soil” etc. In some embodiments, recognizing that soil may appearto have a large variation in texture, e.g., top soil, fertilized soil,soil with tier marks, and so on, various different identifications mayall be simply labeled as “soil” and treated as such in the subsequentagricultural operations. In some embodiments, soil detection via imageprocessing may be combined together with detection of other agriculturalobjects to further increase a confidence level.

In some embodiments, soil detection models, and other ML models, may betrained by sending a subset of images for human input. Such an input maybe used to train parameters of ML model related to soil texture, soilcolor shade, whether a counter or pattern in soil represents a crack ora root or another object and so on. In some embodiments, every imageregion where a soil crack is detected may be sent for human verificationabout whether the crack is indeed a crack or a weed, a leaf or a root,e.g., a baby carrot leaf.

Example Embodiments of Capture and Identification of Target Action

As disclosed with reference to FIGS. 4A-4D, in some embodiments, atreatment action may be captured by sensors and automatically identifiedusing sensor input processing techniques disclosed herein. FIG. 18 is aflowchart for an example method 2800 of sensor input processing. Themethod 2800 may be implemented by the agricultural platform or vehicledisclosed herein.

At 2810, sensor images of a vicinity of a target object may be capturedduring a time interval during which a treatment is applied to the targetobject. The number of captured images may vary from a single image up toseveral tens or hundreds of images that show the treatment action.

At 2820, the one or more sensor images are processed using one or moremachine learning (ML) algorithms wherein at least one ML algorithm usesan ML model trained to detect a presence of a treatment action in thevicinity of the target object.

At 2830, selectively based on a result of detecting the presence of thetreatment action in the vicinity of the target object, an outcome of theprocessing is provided for further processing.

In some embodiments, the processing is performed using at least two MLalgorithms including a first ML algorithm trained to process sensorimages at a greater resolution than a second ML algorithm that istrained to detect the presence of the treatment action in the vicinityof the target object. For example, the first ML algorithm may analyze abigger sized image such as a 768×768 or a 4K image, while the second MLalgorithm may operate on a smaller sized image. For example, as depictedin FIGS. 4A to 4D, a segment or a region of image that possibly includesan object of interest may be processed by the second ML algorithm. Someadditional examples are disclosed with reference to FIG. 17 .

In some embodiments, the processing is performed using at least two MLalgorithms including a first ML algorithm trained to process sensorimages at a lower resolution than a second ML algorithm that is trainedto detect the presence of the treatment action in the vicinity of thetarget object. For example, the first ML algorithm may analyze adownsampled smaller version of a full-resolution (e.g., 8K resolution)image to identify portions or segments of the image where objects aredetected and eliminating portions where no objects are seen. The secondML algorithm may then operate on a higher resolution versioncorresponding to the segment or portion in the downsampled imageidentified by the first ML algorithm. Some examples are disclosed withreference to FIG. 17 .

In some embodiments, in case that the result of detecting indicates thatthe treatment action is detected from the sensor images, then theoutcome is provided to a database for indicating a success of treatment.For example, the database 1126 may be used for storing the results at anoffsite location. Alternatively, or in addition, the database may simplybe in the form of a local storage at the computing device 1120.

In some embodiments, in case that the result of detecting indicates thatthe treatment action is not detected from the sensor images, then theoutcome is provided to a user interface as a treatment error. Forexample, user interface of the computing device 1120 may be used fordisplaying the treatment error and feedback may be sought from a humanoperator in the agricultural environment in real-time, e.g., acorrective action by the user may be requested.

In various embodiments, the treatment action comprises ejection of afluid towards the target object or emission of a laser beam towards thetarget object. Other possible actions include providing a fertilizer toa crop, or strewing seeds in soil, and so on.

In some embodiments, the method 2800 further includes providing thesensor images for the further processing. In different embodiments, thefurther processing may include one or more of the following: performingfurther training of an ML model for detection of treatment actions,providing a playback loop to play the captured images, providing acorrective feedback to the treatment mechanism in case that an expectedtreatment action is not observed. For example, this may occur when notreatment action is detected when one was expected, or treatment actionwas in excess of the intended treatment action, or the treatment actionfailed to reach the target object, and so on.

In various embodiments disclosed throughout the present document, asensor input may comprise sensor data (e.g., a digital representation)or a sensor signal (e.g., an analog value) that is used for subsequentprocessing. Furthermore, the various machine-based image processingalgorithms may label various objects detected in images or simply drawinferences about the objects. Similarly, object labeling may also beperformed by a human user. For example, a machine learning method maydraw inferences about objects in a sensor input frame, which may then belabeled to be a particular object based on human feedback such as yes orno. For example, a machine may draw an inference that an object is aweed with a 70% confidence, and a human user may provide a “yes”feedback, resulting in that object being labeled as a weed. In general,a first confidence threshold may be used for submitting objects to humanlabeling, while a second confidence threshold may be used to accept thedetected object as being so. For example, all object detections between40% and 90% may be submitted for human feedback, while object detectionsabove 90% confidence may be accepted as being the detected objectwithout requiring additional human input.

It will be appreciated that the present document discloses varioustechniques for processing of sensor inputs in an agriculturalenvironment. In one beneficial aspect, analysis of such sensor inputs isused to automate certain agricultural operations such as weedelimination, cataloging of crop, sowing seeds, adding fertilizers and soon. In another beneficial aspect, an interface is provided for a humanto interact with machine learning to be able to supervise and makecorrections to certain automated tasks.

CONCLUSION

Implementations of the subject matter and the functional operationsdescribed in this patent document can be implemented in various systems,digital electronic circuitry, or in computer software, firmware, orhardware, including the structures disclosed in this specification andtheir structural equivalents, or in combinations of one or more of them.Implementations of the subject matter described in this specificationcan be implemented as one or more computer program products, i.e., oneor more modules of computer program instructions encoded on a tangibleand non-transitory computer readable medium for execution by, or tocontrol the operation of, data processing apparatus. The computerreadable medium can be a machine-readable storage device, amachine-readable storage substrate, a memory device, a composition ofmatter effecting a machine-readable propagated signal, or a combinationof one or more of them. The term “data processing unit” or “dataprocessing apparatus” encompasses all apparatus, devices, and machinesfor processing data, including by way of example a programmableprocessor, a computer, or multiple processors or computers. Theapparatus can include, in addition to hardware, code that creates anexecution environment for the computer program in question, e.g., codethat constitutes processor firmware, a protocol stack, a databasemanagement system, an operating system, or a combination of one or moreof them.

A computer program (also known as a program, software, softwareapplication, script, or code) can be written in any form of programminglanguage, including compiled or interpreted languages, and it can bedeployed in any form, including as a stand-alone program or as a module,component, subroutine, or other unit suitable for use in a computingenvironment. A computer program does not necessarily correspond to afile in a file system. A program can be stored in a portion of a filethat holds other programs or data (e.g., one or more scripts stored in amarkup language document), in a single file dedicated to the program inquestion, or in multiple coordinated files (e.g., files that store oneor more modules, sub programs, or portions of code). A computer programcan be deployed to be executed on one computer or on multiple computersthat are located at one site or distributed across multiple sites andinterconnected by a communication network.

The processes and logic flows described in this specification can beperformed by one or more programmable processors executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows can also be performedby, and apparatus can also be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application specific integrated circuit).

Processors suitable for the execution of a computer program include, byway of example, both general and special purpose microprocessors, andany one or more processors of any kind of digital computer. Generally, aprocessor will receive instructions and data from a read only memory ora random access memory or both. The essential elements of a computer area processor for performing instructions and one or more memory devicesfor storing instructions and data. Generally, a computer will alsoinclude, or be operatively coupled to receive data from or transfer datato, or both, one or more mass storage devices for storing data. e.g.,magnetic, magneto optical disks, or optical disks. However, a computerneed not have such devices. Computer readable media suitable for storingcomputer program instructions and data include all forms of nonvolatilememory, media and memory devices, including by way of examplesemiconductor memory devices. e.g., EPROM, EEPROM, and flash memorydevices. The processor and the memory can be supplemented by, orincorporated in, special purpose logic circuitry.

It is intended that the specification, together with the drawings, beconsidered exemplary only, where exemplary means an example. As usedherein, the use of “or” is intended to include “and/or”, unless thecontext clearly indicates otherwise.

While this patent document contains many specifics, these should not beconstrued as limitations on the scope of any invention or of what may beclaimed, but rather as descriptions of features that may be specific toparticular embodiments of particular inventions. Certain features thatare described in this patent document in the context of separateembodiments can also be implemented in combination in a singleembodiment. Conversely, various features that are described in thecontext of a single embodiment can also be implemented in multipleembodiments separately or in any suitable subcombination. Moreover,although features may be described above as acting in certaincombinations and even initially claimed as such, one or more featuresfrom a claimed combination can in some cases be excised from thecombination, and the claimed combination may be directed to asubcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. Moreover, the separation of various system components in theembodiments described in this patent document should not be understoodas requiring such separation in all embodiments.

Only a few implementations and examples are described and otherimplementations, enhancements and variations can be made based on whatis described and illustrated in this patent document.

1. A processor-implemented method, comprising: performing, using aprocessor onboard a vehicle, a machine learning (ML) processing onsensor input from sensors onboard the vehicle; identifying, according toa rule, a subset of data resulting from the ML processing; andgenerating and displaying, in real-time, the subset of data to a userinterface, thereby enabling a user interaction with the subset of data.2. The method of claim 1, wherein the vehicle is an agricultural vehicleoperating in an agricultural environment, and wherein the user interfaceis of a display device mounted on the vehicle or of a portableelectronic device controlled by a field operator in the agriculturalenvironment.
 3. The method of claim 1, wherein the rule specifies thatthe subset of data resulting from the ML processing comprises portionsof images obtained from the sensor input that, wherein the portions ofimages include one or more target objects subject to a treatment by atreatment mechanism under control of the processor.
 6. The method ofclaim 1, wherein the rule defines one or more of: the subset of data toinclude portions of images obtained from the sensor input at apredetermined time interval; the subset of data to include portions ofimages obtained from the sensor input after a predetermined physicalmovement of the vehicle; or the subset of data to include portions ofimages according to a detection criterion upon the ML processing.
 7. Themethod of claim 1, wherein the vehicle is an agricultural vehicleoperating in an agricultural environment, and wherein the user interfaceis displayed on a device that is at a different geographic location thanthe agricultural environment.
 8. The method of claim 1, wherein atreatment module configured to perform a treatment action is disposed onthe vehicle and wherein the rule defines the subset as a portion of thesensor input that is expected to include the treatment action on atarget object.
 9. The method of claim 8, wherein, upon detecting absenceof the treatment action in the portion of the sensor input that isexpected to include the treatment action on the target object,information is sent to the user interface to enable a humanintervention.
 10. A processor-implemented method, comprising:performing, using a processor onboard a vehicle, a Machine Learning (ML)processing on sensor input from sensors onboard the vehicle;identifying, according to a rule, a subset of data resulting from the MLprocessing; and providing the subset of data to a user interface. 11.The method of claim 10, wherein the sensor input comprises image data,wherein the ML processing comprises annotating patches of the image datausing annotations associated with each patch, wherein each annotationincludes a confidence number associated with a confidence level withwhich the ML processing has identified an object of interest in acorresponding patch of image data.
 12. The method of claim 11, whereinthe annotation comprises one or more of: adding bounding boxes toobjects detected in the image data; adding a syntax element or a pixelto objects detected in the images; or a yes/no selection or a selectionfrom a pre-determined set of selection.
 13. The method of claim 10,further including: receiving user feedback on the user interface for thesubset of data; generating, based on the user feedback, a training setfor further training of an ML model used by the ML processing; andtraining the ML model using the training set for future use.
 14. Themethod of claim 13, wherein the training set is generated by:determining that a particular confidence number associated with aportion of a particular image is below a threshold; presenting theportion of the particular image on the user interface of a user devicethat is at a different geographic location than a location of thevehicle; receiving an input on the user interface, wherein the inputindicates a mode of further processing the particular image; andselectively including, based on the input on the user interface, theparticular image in the training set.
 15. The method of claim 13,wherein the training set is generated by: determining that a particularconfidence number associated with a portion of a particular image isbelow a threshold; presenting a real-world location indication for theportion of the particular image on the user interface; receiving aninput on the user interface, wherein the input indicates a mode offurther processing the particular image; and selectively including,based on the input on the user interface, the particular image in thetraining set.
 16. The method of claim 10, wherein the training set isgenerated as the subset of the image data using active learning in whichthe subset of images is generated using an exclusion criterion thatexcludes images similar to images that were previously user-verified, oran inclusion criterion that includes images having an aggregate lowconfidence level.
 17. The method of claim 13, wherein the subset of dataincludes non-sequential frames of the sensor input and the training setincludes intermediate frames between the non-sequential frames, whereinthe intermediate frames are used as the training set by propagating userfeedback received for the non-sequential frames.
 18. The method of claim1, wherein the vehicle is operating in an agricultural environment andwherein the sensors comprise one or more of a depth sensor, a lightdetection and ranging (LiDAR) sensor, or an infrared camera.
 19. Themethod of claim 1, wherein the vehicle is operating in an agriculturalenvironment, and wherein the user interface is located on a user devicein the agricultural environment.
 20. A computer-implemented method ofsensor input processing, comprising: performing, using a processoronboard a vehicle, a machine learning (ML) processing on sensor inputfrom sensors onboard the vehicle; identifying, according to a rule, asubset of data resulting from the ML processing; and generating thesubset of data for modifying the ML processing for a subsequent use.